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Abstract 

We study a distributed particle filter proposed by Bolic et al. (2005). This 
algorithm involves m groups of M particles, with interaction between groups 
occurring through a “local exchange” mechanism. We establish a central limit 
theorem in the regime where M is fixed and m —>■ oo. A formula we obtain 
for the asymptotic variance can be interpreted in terms of colliding Markov 
chains, enabling analytic and numerical evaluations of how the asymptotic vari¬ 
ance behaves over time, with comparison to a benchmark algorithm consisting 
of m independent particle filters. We prove that subject to regularity condi¬ 
tions, when TO is fixed both algorithms converge time-uniformly at rate 
Through use of our asymptotic variance formula we give counter-examples sat¬ 
isfying the same regularity conditions to show that when M is fixed neither 
algorithm, in general, converges time-uniformly at rate . 
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1. Introduction 


Since their introduction in [T], particle filters have become very popular 
tools in engineering, signal processing, econometrics and various other disci¬ 
plines for approximate nonlinear filtering of hidden Markov models (HMM’s). 
Investigations of particle filters have generated book-length studies, notably [5], 
demonstrating the well-developed state of knowledge about convergence rates, 
fluctuations, propagation of chaos, large deviations and various other properties, 
with more recent contributions to the literature focussing on specific algorithmic 
mechanisms, such as adaptive resampling Eng. 

Trends in the development of computers towards distributed and parallel 
architectures have influenced particle filtering methodology. One of the main 
bottlenecks for computational efficiency when implementing particle filters is the 
interaction between particles which occurs in the resampling step. This step is 
important because it ensures that the algorithm exhibits certain time-uniform 
convergence properties, but is difficult to parallelize. 

A significant piece of work from the engineering literature which addresses 
this difficulty is [5] , introducing an algorithm we refer to as the Local Exchange 
Particle Filter (LEPF), in which groups of particles are spread across compu¬ 
tational units. What makes this algorithm unusual is that the m groups of M 
weighted particles interact through an “exchange” mechanism, which places it 
outside the frameworks of many existing studies, notably [aiaii]- The practi¬ 
cal rationale for the LEPF is to achieve a compromise between communication 
efficiency of the algorithm and the benefits brought about by resampling. In 
particular the interaction between particles in the LEPF occurs in a localized 
manner, making it suited to implementation on a network of computing devices 
without the need for global connections. 

Despite substantial interest in from practitioners—it has 250 citations 
according to Google scholar at the time of writing—relatively little is known 
about convergence properties of LEPF. Indeed the question of whether it truly 
exhibits the same time-uniform convergence properties as the original particle 
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filter of [T] has not been fully answered. The few papers on analysis of the LEPF 
appear to be 017! and the recent technical report M- i concerns analysis over 
a single time-step, and 00 provide proofs of time-uniform convergence of the 
particle filtering approximation error, in Li and Lp norms respectively, in the 
regime where M is fixed and m —)■ oo, for an algorithm of which the LEPF as we 
present it is a special case. However, the proofs of rely on key hypotheses on 
the particle weights which they do not rigorously verify, and which seem difficult 
to check in general. The results of [710 also do not establish a particular rate 
of convergence. 

The structure of this paper and outline of our main contributions are as 
follows (precise statements are given later). In Section]^ we introduce the setup 
of the filtering problem, present the LEPF and describe the main result of 00 - 
We also introduce a standard algorithm consisting of m independent bootstrap 
particle filters (IBPF), each with M particles. The independence in the IBPF 
makes it very easy to parallelize, so from a computational point of view it is a 
natural alternative to the LEPF. In this paper the convergence properties of the 
IBPF, which are already well-understood, serve as benchmarks against which 
to compare the LEPF. 

Section 0 introduces a general algorithm of which the LEPF and IBPF are 
special cases, and gives our main result, Theorem a central limit theorem 
(CLT) for the error in particle approximation of prediction filter distributions, in 
the regime where M is fixed and m —> oo. We address time-uniform convergence 
in Section Our first result here is a positive one: that under strong but 
standard regularity conditions, in Lp norm the error from the LEPF converges 
time-uniformly with rate in the regime where m is fixed and M —>■ oo. 

The same is true of the IBPF. Our second result, Proposition 0 in Section 
shows that growth without bound of the asymptotic variance in our CLT is 
sufficient to rule out time-uniform convergence at rate wT^!'^ in the regime 
where M is fixed and m —>■ oo. 

Section 0 investigates various properties of the asymptotic variance for the 
LEPF and compares them to those of the IBPF. In particular, we show by 
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examples in Sections 5.1 and |5.2| that under conditions which can be considered 
very favourable for performance, the asymptotic variance for the LEPF and 
IBPF can grow over time without bound. This can be considered a negative 
result for the LFPF, since the sequence of asymptotic variances (over time) for 
the original particle filter of ^ has been shown under weaker conditions to be 
bounded, or tight when the observations in the HMM are treated as random 
[iiiniiiiiiiiiis]. Moreover, combined with Proposition in Section |4.2[ these 
examples serve as counter-examples to time-uniform convergence at rate 
This does not contradict the time-uniform convergence results of 13 111 , since 
the latter results do not pertain to a specific convergence rate, and they concern 
the updated filtering distributions. However, our Proposition allows us to 
confirm that a hypothesis slightly stronger than that of [S] does not hold in 
general, even under favourable conditions. Section [^contains further discussion 
and interpretation of our results. Our analysis allows us to explain qualitatively 
why the asymptotic variance for the LFPF may be lower or grow over time 
more slowly than that for the IBPF, and we illustrate this phenomenon with 
numerical results. 

Some clarifications about originality are in order. To the knowledge of the 
authors, our CLT is the first result of its kind for the LFPF. Our starting point 
to prove this result consists of a martingale decomposition and error bounds. 
Proposition in Section which is an application of a result obtained by the 
authors in m for a class of algorithms which includes the LFPF. However, we 
emphasise that Proposition is only one of the first steps towards the CLT 
itself, leaving us with substantial work to do. In our study of time-uniform 
convergence, we also appeal to a result of [T3] (Proposition in the present 
paper), but again we have some work to do in dealing with the specifics of the 
LEPF. We also point out that despite some superficial similarities, the details 
of LEPF and our analysis differ substantially from those of some resampling 
algorithms studied recently by the authors in m- 
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Notation 


For any measurable space (X, A') we use ^(X), and ^(X) to denote 

the set of measures, probability measures and the set of bounded and mea¬ 
surable functions defined on X, respectively. N includes 0. For any N-valued 
TO > 1 we write [to] := ,to}. Whenever summation over a single vari¬ 

able appears without the summation set made explicit, the sum is taken over 
the set [N], i.e. for summations over multiple variables we 

write J2{ii ip) = Sii ■ ■ ■ Sip- to denote the identity mapping for 

any domain of definition and 1 to denote a constant function equal to 1 ev¬ 
erywhere. For any function : A —)■ M, we define ip®‘^{x,y) := Lp{x)(p{y) for 
all x,y e A. For ip G £§{X) we define ||(/3||oo := sup^, |(/?(x)| and osc((/?) := 
sup,j,_j, \p{x) — p{y)\- For any G ^(X), y0iy denotes the product measure 
and := y® p. We use 5x to denote the point mass located at x. We define 
\x\ := max(z G Z : z < x) and (y mod x) := y — [{y — l)/a;Ja;. All random 
variables we encounter are considered to be defined on some underlying proba¬ 
bility space (n, P), with expectation w.r.t. P denoted by E. Convergence in 

P 

probability under P is denoted by — 

2. Filtering framework and the LEPF 

Let X = (A„)„gN be a Markov chain taking values in a measurable Polish 
space (X,A), having initial distribution ttq G £i^(X) and transition kernel / : 
Xx [0,1], 

Ao~^o, A„~/(A„_i, •), Vn>l. 

Let Y = (y„)„gN be a process taking values in a measurable Polish space (Y, A") 
such that (Fn)neN are conditionally independent given A, with the conditional 
distribution of given X being 

r„~5(A„, •), VneN, 

for a probability kernel g : X x y ^ [0, !]■ For all x G X, we assume g{x, ■) 
admits a density with respect to a a-finite measure on (Y,3^), and the same 
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notation g{x, ■) will be used for denoting this density. From here on, we consider 
a hxed Y-valued observation sequence {yn)ne'N, write gnix) := g[x,yn) for all 
a: € X, and assume that the following mild regularity condition holds. 


Assumption 1. For all n S N, S and gn{x) > 0 for all x gX. 


We focus on approximating the .i^(X)-valued prediction filter sequence (7r„)„gN, 
which cannot be computed exactly, except in some special cases. This sequence 
is defined for all n > 1, by the recursion 7r„ = $„(7r„_i) where : ^{X) —>■ 
^{X) is the operator 




fx ffjx, yn-i)f(x, A)fl{dx) 
Jx9{x,yn-i)y^{dx) 


V Ag A, ^1G A^iX). 


If iVn) jigN is replaced by the random sequence (Xn)nGN) then is a version of 
the conditional distribution of A„ given Yq^ , Yn-i. 

The algorithm which is our main object of study is one of several proposed 
in [5 and there called the “Distributed Resampling with Non-proportional Allo¬ 
cation and Local Exchange” algorithm. For brevity, we refer to it as the LEPF. 
It is shown in Algorithm At each time step n, this algorithm delivers a col¬ 
lection oi N = Mm particles (n = {Cn '■ i G [A]} and weights {Wf : i G [A]}, 
and the weighted empirical measure 


TT 


N 

n 




( 1 ) 


is regarded as an approximation to 7r„. The sampling steps of Algorithm 
should be understood to mean that the particles C,n = {Cn'A G [A]} are condi¬ 
tionally independent given Cq, ..., Cn-i- Within each of the m groups of equal 
size M, the particles are drawn according to a common resampling/proposal 
mechanism. Indeed one can read off from Algorithm that 


Wf = Wi and P(Cg •|Co,...,Cn-i)=P(aG •|Co,...,Cn-i), G 

( 2 ) 

and the parameter 6 G {1,..., M — 1} influences the interaction between groups 
via the indices Lb 
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Algorithm 1 Local exchange particle filter 
for 1 = 1,..., Mm 

Set Wq = 1 and sample Q ^ tto 

Set D = (i + 9) mod Mm 

for k = 1,..., TO 

Set Gk = {{k- l)M + 1,..., (/c - l)M + M) 


for n = 1, 2,... 

for fc = 1,..., TO 


for i G Gk 

Set Wi = {Mm)-^J2jeG^ 
SampleC^|Co,---,Cn-i ~ 


J2jeGk ^n-l9n-l{Cn-l) fiCn-l, ' ) 

J2j^Gk ^n-l9n-l{Cn-l) 


In this paper, we primarily focus on the asymptotic regime M fixed, to —> oo. 
Interest in this regime stems from parallel and distributed implementations: 
typically the sampling and weight computations for the to groups are performed 
concurrently by a network of to computers, so the regime M fixed, to —^ oo can 
be thought of as corresponding to an increasingly large network, in which each 
computer handles M particles, see for details. 

mi studied an algorithm of which the LEPF as we present it in Algorithm 
[^is a special case. Our mapping z i—>■ L* is a particular instance of the mapping 
denoted by f3 in [ZlIH] and if one sets their exchange period parameter no = 
1, one recovers Algorithm The generality of /3 in allows for other 

patterns of interaction between particles, beyond the ones considered in the 
present article. Whilst we focus on the prediction filter distributions 7r„, [ 7111 ] 
focus on particle approximations of the updated filtering distributions 7f„(A) := 
7J‘n(5nlA)/7rn(5n), A S A, n > 0. To allow us to state their result, for each n > 0 
let {CnG G [A]} be random variables which are conditionally independent given 
Co,---,Cn, with 


P(Ce •|Co,...,Cn) 


SjGGfc 9n{Cn 

Ej^Gk Wn9n{Ckh ’ 


V z € Gfc, 
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and 


_ 




n+lEi 


The {C^;z S [N]} can be understood as “integrated out” in Algorithm 

In the notation of the present paper and with := for all i,j S 

Gk and k G [m], the key hypothesis of [H Assumption 3] can be equivalently 
written as follows: there exist e G [0,1) and g > 4 such that, 


sup sup 

m>l n>0 


E 




max 


91 


< oo. 


( 3 ) 


feG[m] Wn 

Under this hypothesis, plus additional but standard regularity conditions, the 
main result of [5] is: for any ip G ^(X), M > 1 and 1 < p < g with g as in (§, 


lim supE[|7f,f’"((^) 




A similar result for the case p = 1 was established in [ 7 ] under stronger con¬ 
ditions. However, in [8] the hypothesis ([^ is not rigorously verified, and only 
empirical evidence that it holds is presented. We shall comment further on (§ 
in Section 1321 

The role of the indices L® in the LEPF is made more transparent if one com¬ 
pares to an alternative algorithm, what we term independent bootstrap particle 
filters (IBPF), shown in Algorithm below. The IBPF amounts to m indepen¬ 
dent copies of the original bootstrap particle filter of [T], each with M = N/m 
particles. Indeed one can read off from Algorithm that for the IBPF the m 
collections of particles n gN}, k G [m] are independent, making 

the IBPF very easy to parallelise and hence in practice it is a natural alterna¬ 
tive to the LFPF. Algorithmalso clearly satisfies ([^, and one could write the 
“Sample” step more simply as: 


P(Ce •|Co,...,Cn-i) = 


EjGGfc 5n-l(Cn-l)/(CEl) ’) 


V z e Gfe, 


EjGGfe 9n-l{Cn-l) 

but the presentation of Algorithm highlights the connection the LFPF: if in 
Algorithm[^one were to set 0 = 0, so L® = i, then one recovers exactly Algorithm 
With the weights Wf as calculated in the IBPF, one again regards as in 









Q as an approximation to 7r„, and the statistical independence between groups 
means that convergence properties of the IBPF in the regime where M is fixed 
and TO —>■ 00 are relatively easy to study. 


Algorithm 2 Independent bootstrap particle filters 
for i = 1,.. ., Mm 

Set Wq = 1 and sample tto 

for k = 1,..., TO 

Set Gfe = {{k - l)M + 1,..., (/c - 1)M + M} 


for n = 1,2,... 
for fc = 1,..., TO 


for i G Gk 
Set Wl = 

SampleQ|Co,---,Cn-i ~ 


^i-l9n-l (Cn-l) 

^ri-lgra-1 (C»-l)/(Cn-li ') 

SjeGfc ^n-l9n-l{Cii-l) 


3. Central limit theorem 

3.1. A general algorithm and statement of the main result 

The starting point for our analysis is to write down Algorithm of which 
the LEPF and IBPF are special cases. We do this not just for the sake of 
generality. Instead Algorithm affords us some notational simplifications and, 
more crucially, it allows us make clear that the LEPF is a special case of the 
so-called aSMC algorithm, introduced by the authors in M- In turn this later 
allows us to leverage some results of [T3] —in particular Proposition below— 
providing some building blocks for our CLT. The IBPF is also an instance of 
Algorithm and this fact eases our presentation of comparisons between it and 
the LEPF in Section [5] 

From henceforth, the integer M > I is, unless stated otherwise, assumed 
to be fixed. In Algorithm a is a row-stochastic matrix, of size N x N, with 
N = Mm. Assumption introduces hypotheses on the matrix a for each value 
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Algorithm 3 
for i = 1,..., N 

Set Wq = 1 and sample Q ^ tto 

for n = 1,2,... 

for i = 1,..., N 

Set = E, 

Sample C I Co, • ■ ■ ,Cn-i E, ') 


N e {Mm : m > 1}. To state these hypotheses precisely, we need to be clear 
about dependence of a on A and hence write aw up until the end of Section 
|3.1[ beyond which we revert to a to reduce notational clutter. 

Assumption 2. For all N G [Mm : m > 1}, 

(2.1) ajq is doubly stochastic, 

(2.2) for all i,j G [A] and z G'E, 

ij _ (i-\-zM) mod N,{j-\-zM) mod N 

O^N ~ 


Additionally, for some integer fd >1, 

(2.3) = 0 for A > 2/3 + 1 and i,j G [A] sueh that 

j) •= ™iu \i — j + ^A| > /3, 

( 2 . 4 ) there exists {a^ : i,j G Z} such that for A > 2/3 + 1, 

= bjGZ- (4) 


Assumption (2.1) allows us to apply results from [T3] to Algorithm]^ As¬ 
sumption [(2^ asserts that the elements on each diagonal of aw are periodic 
with cycle length M. Intuitively, this captures the idea that the A particles 
in Algorithm are in some sense organised into groups of size M. It is easily 
verified that the function A appearing in Assumption (2.3) is a metric on [A], 
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in particular it is the graph distance on a cycle graph with vertex set [iV] where 


there is an edge between each i S [N] and (z + 1) mod N. Assumption (2.3) 


then asserts that ajv is a band matrix in the sense that elements further than /? 
away from the main diagonal in metric A are equal to zero, in turn influencing 
the conditional independence structure of the particles in Algorithm Finally 


Assumption (2.4) can be interpreted as meaning that there is some common 
structure to the matrices as N grows, and loosely speaking, this common 
structure is captured in the “limiting” doubly infinite matrix Ofoo, which will 
show up later in our CLT. 

Let us now state how the LEPF and IBPF fit in this framework. Consider 
a% = = [{{j -e)^^N -1)/M\] Vf,jG [TV] 

( 5 ) 

ag = - 1)/MJ = L(j - 0 - 1))/MJ] V z, j G Z. 

It is a matter of elementary but tedious manipulations to show that with a = 
as in ([^ , Algorithm reduces to the LEPF as in Algorithm and to check 
that Assumptions |(2.f)]-|(2.3) hold with j3 = M — 1 -\- 9. Checking Assumption 


(2.4) involves some less trivial work and a proof is provided in the Appendix. 


( 6 ) 


To recover the IBPF from Algorithm]^ we take 

a% = M-iI[L(z - 1)/MJ = L(j - 1)/MJ] V z, j G [N] 

ag = M-iI[L(z - 1)/MJ = L(j - 1)/MJ] V z, j G Z. 

With /3 = M — I, checking Assumptions (2.1) - (2.3)| is again elementary, and in 
this case Assumption |(2. 4) is obviously satisfied. 

Figures and[^ show the matrices defined in ([^-([^ in the case N = 9, 
M = 3 and 6 = 1. It follows from Assumptions |(2.^ and (2.4) that ago is, 
like each a at, a row-stochastic matrix, which can be thought of as specifying 
the transition probabilities of a Z-valued Markov chain. It turns out that the 
asymptotic variance in our CLT is expressed in terms of two copies of this chain. 
To this end, denote by E„^„, where zz,z; G Z, the expectation w.r.t. the law of 
the bi-variate backward Markov chain (Ik, Jk)o<k<n, where 

(^n, Jn) ^ ® ^v, 


V(Ik = ik,Jk = jk I Ik+1 = ik+l, Jk+1 = jk+l) = . 


( 7 ) 
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Figure 1: Matrices in (a) and (b) correspond to the LEPF and IBPF, respec¬ 
tively. 




(a) (b) 

Figure 2: Some of the paths assigned positive probability by aoo for the (a) 
LEPF and (b) IBPF. In both cases M = 3 and in (a) 0 = 1. 

Figure illustrates some segments of paths for I (or J) which have strictly 
positive probability under the transitions aoo for the LEPF and IBPF, with 
M = 3 and 0 = 1. 

Before stating our main result we introduce some more notation. For all 
n > 1, define non-negative kernels Qn : X x d:” —>■ IR+ as 

Qn{x,A) ■= gn-i{x)f{x,A), 'ix&X.A&X, (8) 
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and the corresponding operators on functions and measures 



respectively. Moreover we define for n > 1 and 0 < p < n 



Qp,p ■ Qp,n ■ Qp+i' ■ ■ Qni 


where := Qn/T^n-i{9n-i) for all n > 1. Also let 


7n := ’’"oQo.n, V n > 0. 


Define the tensor-product kernel Qf y, d(a;', y')) := Qn{x, dx')Qn{y, dy'), 
with the corresponding operators on functions and measures written similarly 
to those for and finally define operators Cq and Ci, such that for any 
P e .^(X2), 

Co{‘P){x,y) =‘p{x,y) and Ci{ip){x,y) = ip{x,x), Va:,yGX. 

We then have: 

Theorem 1. Fix M > 1 and /3 > 0 and suppose that Assumption^holds. Then 
for any ip G Algorithm^has the property 


(t) -'Xu{t)) - / - > A/'(0,g^), VneN, 


where N goes to infinity along the sequence {Mm : m = 1, 2,...}, the following 
variances are assumed strictly positive, 


CTo = 'XoUt - 



n > 1 


with Ck = = Jk], for all 0 < k < n, and p := p — 7r„((/?). 
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Remark 1. Since the LEPF and IBPF are special cases of Algorithm Theo¬ 
rem [^applies to them immediately. Note that the only distinction between the 
asymptotic variances for LFPF and the IBPF arises from , as given for these 
two algorithms in ([^ and ([^. In Section]^ we shall examine for the LFPF 
and the IBPF in detail, which involves study of the /, J processes for these two 
algorithms. 


3.2. Martingale array and the proof of the main result 
Defining the random measures 

1 ^ AT T^N 

YwfSci, VnG 


Z=1 


N, 


(9) 


allows us to decompose the particle approximation error as 


i<f) - {<p) (P^ (1) - 7n(l)) , (10) 

where if •.= (f — 7r„((/?). 

Our overall strategy in proving Theorem is to establish asymptotic nor- 
- - 

mality of (^) as —)■ oo using the CLT for martingale arrays [16], and 


to apply results from [M] to show that the second term on the r.h.s. of (10) 


converges to zero in probability. Our first step is to identify a martingale rep- 
resentation for yA^F^ (^), for which the setup is as follows. 

Fix n € N and M > 1. For given m > 1 and (p € ^(X) define, for q € [Mm], 


Am ._ 

So ■ — 


1 


■ a[M^ 
and for [(« + l)Mm] \ [Mm\, 


(Qo.n(<^)(Co) -’^oQo,n(‘/?)): 




where p = Pm{e), * = i-miQ) and 

£ 1-1 


Pmig) ■■= 


Mm 


and imig) ■= graod Mm. 


( 11 ) 


(12) 
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Writing out the expression for Wp, p > 1, in Algorithm using the fact that a 
is row-stochastic and Assumption 


p-i 


p-i 


wi- = 


(io,...,ip-i) g =0 q =0 


< 00 . 


Combining this with ( |11[ ), ( |12[ ) and again using Assumption]^ we have 

1 n^=o ii5,ii 


sup 

Q£[{n-\-l) M m] 


X ~—m-osc(Q ((p)) < oo, (13) 

vMto pe{o,...,n} 7p(l) 


with the convention = 1- 

In our m ^ oo analysis we consider the quantities in (11)-(12| associated 
with an instance of Algorithm for each m > 1. We harmlessly assume that 
P makes these instances statistically independent, but we commit an abuse, 


especially in (14) below, and suppress from the notation the association of {Cp : 
i G [Mm]}, and various other objects, with the particular value m. 

For each m > 0 define := {0,X}, and then define cr-algebras : 1 < 
p < (n + l)Mm, m > 1} recursively by 


TT-m_Tnm —1 \/ rr 

Q ■“ ‘^(n+l){m-l)M ^ ^ 




'Pr^(l)’ ■ ■ • ’Sr^(p) 


)■ 


(14) 


With these definitions in hand, we can state the following result. The bound 


in (15) summarises (13), the rest of the statement is a direct application of [H 
Proposition 1 and Theorem 1] and provides what we shall need: the desired 
martingale structure and bounds on the particle approximation errors. 

Proposition 1. Fix n > 0, /3 > 0 and M > 1 and suppose that Assumption 


(2.1) holds. For any ip G there exists Cn G K such that 

1 


c < 


x/M 


m 


-Cn, V m > 1, Q G [{n-\- l)Mm]. 


(15) 


For each m> 1, | ^ X)s=i : g £ [{n + l)Mm]| is a zero-mean, square 

integrable martingale and 


(n+l)Mm 

E c- 

e=i 


(16) 
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Moreover, for any p> 1 


sup VMm E[|7r^^'”(i^) - 7r„((/3)|P]^/P < oo (17) 

M,m>l 

sup - 7 „( 1 )|^] 1 /P < oo (18) 

M,m>l 

Remark 2. By a Borel-Cantelli argument, it follows from that for 

both the LEPF and IBPF, the particle approximation errors TT^'^{(p) — iTnip) 
and r()^™(l) —7„(1) converge to zero almost surely, both in the regime M fixed, 

TO —>■ oo and in the regime to fixed, M —>■ oo. 

Remark 3. It follows from the martingale part of Propositionj^that E[r^™(l)] = 
1, implying that for the LFPF and IBPF, r()^™(l) is an unbiased approxima¬ 
tion of the normalising constant 7 „( 1 ); a fact implying that these algorithms 
are also suitable for implementation as a part of a particle Markov chain Monte 
Carlo algorithm m- Some of the arguments in the proof of Theorem could 
be adapted to establish asymptotic normality of '/Mrn(r^'^{l) — 7n(l)) with 
M fixed, TO —>■ oo, but the details are beyond the scope of this paper. 


-Mm , 


In order to establish asymptotic normality of yMmT^ {p) we shall apply 
the following special case of [1^1 Theorem 3.2]. 

Theorem 2. 


.F” 


2. Fixn > 0 and M > 1. For each m > 1, suppose that | ^ X]s=i - g 
g G [(fi + 1 )Mto]| is zero-mean, square integrable martingale, and that F™ C 
for each g G [(n -|- 1 )Mto]. If 


-nm 

■Gg-1 


-nm 

■Gg-l 


-)■ 0, Ve > 0, 


-)■ cr^, > 0, 


then 


(n+l)Mm 

^ E[(e™)^i[|Cl>^] 

{n+l)Mm 

E 

(n+l)Mm 

E c- 

rr. _ _ 

We now present the main arguments in the proof of Theorem 


A/'(0, cr^). 


(19) 

( 20 ) 

(21) 
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Proof of Theorem We first note that the case n = 0 is trivial, since in Al¬ 
gorithm]^ {^g : i G [-/V]} are i.i.d. samples from ttq. So it remains to consider 
n > 1. With the definitions (11), (12), and (14), Propositionestablishes that 


{(He, -K" 


S=1 


'■ Q G[{n+ l)Mm]| 


constitutes the martingale array as in the statement of Theorem]^ and our next 
task is to check conditions (19) and (20). 


Condition (19) is easily seen to be satisfied due to (15). The majority of our 


work then goes into checking (20). Since, for given m > 1, ^ Q G 

[(n -|- 1 )Mto]} is a martingale difference sequence, we have 


(n+l)Mm 


= E 


(n+l)Mm \ 

i: c 

^-1 / 

(n+l)Mm 

+ ^ E[(e™)^|j-™i]- e[(c™)^ 

e=i 

Proposition in Section |3.3| establishes convergence to zero of the residual, in 
the sense that 


(n+l)Mm 

e=i 


m—¥oo 


-G 0 . 


Proposition]^ in Section [3.4| establishes convergence of the variance, in the sense 
that 


E 


' (n+l)Mm \ 2 

E c 

e=i / 

1 




0<u<M 

]i?]<2n/3 


where Ck = = Jk]- Thus condition (20) is satisfied and so by (16) in 

Proposition ]^ 

Af(0, al). (22) 


By (2.1) we can use 0 and (]T^ of Proposition ]^ and Holder’s inequality to 
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obtain 


lim E 

m—f oo 


Mm/—' 


< lim E 

m —>-00 


7rf-(^) '1 " sup yA^E[|rf-(l) - 7 „( 1 )|"] ^ = 0, 


m>l 


implying 


™(i) - 7„(i)) 0. 

\ / m—^oo 


The claim follows by Slutsky’s theorem from (|10[), (22) and (23). 


3.3. Convergence of the residual to zero 
Proposition 2. Under the assumptions of Theorem^ 


(n+l)Mm 

E e[(c)' 

g=l 


~nirL 

'^Q-l 


-E 


Hi 


0 . 


Proof. Define: 


:= E 


(«« 


~rirL 

'^Q-l 


-E 




By Markov’s inequality we have for all e > 0 that 


(n+l)Mm 

i: 2," 

0=1 


) (n+l)Mm 

E 

(n+l)Mm 

+ ^ E Ee 

g=l g'^g 


^ra 


By ( [I^ , {Z'^Y ^ ‘iCi^/{Mm)^ and hence 


E E[(^r)' 

e=i 


< 


^in+l)C* 

Mm 


0 . 


(23) 

□ 


(24) 


To establish convergence to zero of the second summation on the r.h.s. of 


(24), we shall show that suitably many pairs Z™, Z™ are independent, therefore 


making no contribution to the sum since E[Z™] = 0, and use (15) to bound the 
remaining pairs. Introduce the notation, for i G [Mm], 

pa(Co):=0, pa(C) := {Cg : 0 < g < n, j e [Mm], (a”"«)*'^ > O}, n > 1, 

(25) 
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and by convention let a{%) be the trivial cr-algebra. Our strategy to obtain a 
lower bound for the number of independent pairs Z™, Z™ is as follows: 

Lemma shows that Z™ is measurable w.r.t. cr(pa((*"‘^^^j)), and conse¬ 
quently 

^(pa(C:(S))^'^(p^(C:K ^ 

Lemmashows that for any 0 < p,q < n and i,j G [Mm], 


pa(Cp) npa(C;’) = 0 


'■(pa(Cp)) -L cr(pa(q)). 


Lemmaj^shows that the number of pairs g ^ g' such that pa(Cp'"(g))npa(Cp"'|^,j) 
0 is at least Mm{n + — 4n/3 — 1). 

The total number of pairs (g, g') where g ^ g' is (n+ l)Mm((n + l)Mm— 1) 


and hence by (151 


(n+l)Mm 

E Eie 

g=l g'^g 


2^m 


4/74 

< ,, ” ((n-|-l)((n-|-l)Mm—1) — 4 n/l—1)) 
Mm 


which is easily seen to converge to 0 as m —)■ cx), completing the proof of the 
Proposition. □ 


Before presenting Lemmata HU we point out the following useful conse¬ 
quence of (25). Note that Cq' ^ pa(Cp’’) if and only if there exists a sequence 
{iq,..., ip), such that nfc=p-i > 0. Using this equivalence it follows that 

ii £ < q < p and G and Cq” € pa(Cp’’)) then also C/ ^ pa(Cp*’), and 

thus we have the implication 


C G pa(Cp’’) ^ pa(C) C pa(Cp’’). (26) 

Lemma 1. For any g G [(n-f Z™ is measurable w.r.t. o'(pa(C7(e))) ■ 

Proof. The variables {Co}ie[Mm] are independent, so for g G [Mm], Z™ = 0, 
P —a.s. and o’(pa(C7(p))) = hence the claimed measurability holds. For 

Mm < g < {n + l)Mm we need to show that 


E 


(er 


J"; 


g-l 


= E 


(C 


pa(C 


Fe)) 


— a.s. 


(27) 
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According to Algorithm 


P(c^ e A|Co,...,Cp-i) = 


J2j o^"^^w^-i9p-i{Cp-i)fiCp-i,A) 

W"J 


Writing out the expression for Wp^ from Algorithm gives 


P-1 


9=0 


“-a.s. ( 28 ) 


(29) 


which clearly is measurable w.r.t. (T(pa(Cp’’)). Noting additionally (26), we 
find the r.h.s. of (28) also measurable w.r.t. (T(pa(Cp’’)). The latter observa¬ 
tion combined with the fact that in Algorithm the variables {Cp}iG[Mm] O'l'e 
conditionally independent given ^O) • ■ •, Cp-i) shows that 


P(C;e^|Co,--.,Cp-i,Cp,---,Cr') =p(c;e^|pa(c;)), P-a.s. (so) 

Then using again the fact that is measurable w.r.t. (T(pa(Cp)), we have by 


(12) that (27) holds for Mm < g < Mm{n + 1), which completes the proof. □ 


Lemma 2. For any 0 < p, q < n and i,j € [Mm], 

pa(Cp) npa(C;)) =0 ^ cr(pa(Cp)) T cr(pa(C;))). (31) 

Proof. The implication in ( [sT] ) holds immediately in the case that p, g S {0,1}, 
due to the convention that cr(0) is the trivial cr-algebra and the independence 
of the (qS. So suppose w.l.o.g. p > 1 and 0 < q < p, fix any i,j G [Mm] and 
assume that pa(Cp H pa(C^) = 0. For 0 < r < p, define the sets of random 
variables 

Zr := pa(Cp) n {Cs ; 0 < s < r, ke [Mm]}, 

K ■■= pa(C^) n {C^O < s<r, kG [Mm]}, 

notice that Zp_i = pa(((p and similarly = pa(C^), so our objective is to 

prove a{Zp-i) T a{Zp_i). Notice also that n = 0 for 0 < r < p since 
we have assumed pa(^p) npa(((^) = 0. We proceed with an inductive argument, 
the induction hypothesis being that for some 0 < r < p — 1, 


a{Zr) T a{Z}). 


(32) 
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To initialise, observe that (32) holds with r = 0, due when g = 0 to the con¬ 


vention that (t( 0) is trivial, and due when g > 0 to the independence of the Q’s 
and Zq n Zq = 0. Now assume that (32) holds for some 0 < r < p — 1, for 


each C S Zr+i U Z'_|_]^ let be an arbitrary member of X and let be the 
event {C G ^ci- Then writing Qr '■= cr(^ 0 i ■ • • i Cr), and with the convention that 
products over the empty set are unity, we have 



Gr] n 


c?. I p I n Ac 


Gr] n I[^c] 

CG2rUZ' 


i n 

u(z;) 

n i[^c] 

\ce2;+i\21 


C^Zr-UZ^ 


The first equality uses the tower property of conditional expectations and the 
fact that a{Zr) V cr{Z[.) C Gr- The second and third equalities use the following 
facts: in Algorithm]^ = {Cr-fi '■ k ^ [Mm]} are conditionally independent 
given Gr] for any ^ S Z^+i \ Z^ (resp. C G Z}j^i \ Z'), P(Ac| Gr) is measurable 


w.r.t. cr(pa(C)) (see (28)-(30)); pa(C) C < s < r, fc G [Mm]} and by (26) 
pa(C) C pa(Cp, hence cr(pa(C)) C cr(Zr) (resp. CT(pa(C)) C a{Z})). The fourth 


equality holds by the induction hypothesis. By a monotone class argument, (32) 


then holds with r replaced by r -|- 1, which completes the induction and hence 

□ 


also the proof of (31). 


Lemma 3. Under Assumption \(2.3)\ the number of pairs g ^ g' such that 
P®'(C"'(e)) at least Mm{n + l)^(Mm — 4n/3 — 1). 
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Proof. We start by proving the implication 


9=0 


in) < (n — p)/3 < n/S V 0 < p < n. (33) 


By (2.3) n:=o Q,*<!+i»i > 0 implies A(ip+i,ip) < /3,\/ 0 < p < n and then since 


A is a metric, (33) follows from the triangle inequality. 


Note that by (25) and ( [33| , pa(Cp) C {Cr! 0 < r < p,A{i,k) < p/3} and 
therefore when i = 1, pa(Cp) H pa(Cg) = 0 for all 0 < p, g < n and j G 
{2n/3 + 2,..., Mm — 2nP}, the latter set being non-empty for all m large enough, 
since M, /3 and n are fixed. Hence for i = 1 fixed, there are at least (n -|- 
l)^(Mm — 4n/3 — 1) pairs (Cp, Q) such that pa(Cp) H pa(Cg) = 0. Then allowing 
i to vary over the set [Mm] gives the lower bound as claimed. □ 


3 . 4 . Convergence of the variance 
The main result of Section iTdl is: 


Proposition 3. Under the assumptions of Theorem^ for all n > 0 


lim E 

ra—^oo 


5 d 


M7„(1)2 


^ • • • Qf (p®2) 


Q<u<M 

\v\<2n(3 


where Ck = = Jk], for all 0 < k < n. 

From (16) and ([^ it follows that 

Mm 


E 


' (n+l)Mm \ 

i: c 

g=l / . 


7n(l)- 


E 


Mm 


J2^nnCn) 


(34) 


(35) 


The first step towards proving Proposition!^ is to develop an expression for the 
expectation on the r.h.s. of ( |35[ ) in the following Lemma, which is inspired by 
tensor product analysis of [T5] . 
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Lemma 4. Fix nSN, M>1, to>1 and set N = Mm. For any tp £ ^(K), 
2‘ 


E 




= jv5 S 

(iO'.n,jO:n) V 9=0 / 

Proof. Throughout the proof we use the shorthand notations ip.,q = (ip,... ,iq) 
and jp.,q = (jp,.. .,jq), where q < p. For all 0 < fc < n, let Gk := a(Co,..., Ck), 
and let (p € ^(X^). 

For all i £ [N] 


E 


Gn- 




and for i ^ j 
E 


Gn- 




So for all i, j £ [N] we have 

E[{WfS^.^(^WiS^^){p)\Gn-i\ 


= ® (E«''^-i^C1_i)) (Qf (%=.](‘^)))- (36) 


In the remainder of the proof we write = l[ik = jk] for brevity. From (361 
we conclude that 


E 


F E ))(«') 




which we use to initialise a backward induction. The induction assumption is 
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that for some 1 < k < n, 
E 


1 


^ [ n j ® irf a^,.) (ofic..„ - - ■ oro 


(ifeinjfc™) \ 9=fc 


Then applying (|36| and the tower property of conditional expectations, 
E 




Gk-1 


1 

Jp 


( n-1 '' 

q=k / 


X E 


y^'=(5^., 0 ITf 5^.,) • • • QTCeJ<p) 


Qk-l 


' n—1 


n/^g+iJg 


(ifeinjt™) \ 9=fc 

X ff E ® ( E 


^-K—J 

n — 1 




QTc, 


E I n 


■^g+i^g rv^g+iJq 


Q.-'y-r-L-'yQ;. 


{ik-l:ndk-l-.n) \q=k-l 


X ® • • • QTCeS ^)), 

proving that the induction hypothesis holds at rank k — \. Thus 


E 




1 

7V2 


P: 

’ n—1 


Qo 


^ n ( fTo^o^^-o 0 ) [Q^Ce, ■ ■ ■ 


(*0;n JO:n) V 9=0 

Finally, since {Co : * G W]} are i.i.d. samples from ttq and Wq = 1 for all i G [N], 
we have 


E 


1 

N ■ 




= ^2 E ( 

{io-.n,joP V 9=0 / 
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from which the claim follows by observing that 



Proof of Proposition Throughout the proof we use the shorthand notations 

ip:q •— • ■ • 7^9)5 jp'Q ■” iJp^ ' • ' '^p:q “t” ^ ■” ifp “f • • • ■> iq T U) and 

jp:q + u := (jp + M,..., jq + u) for any u G Z and p, g G N such that q < p. Also 
we define 


, (fo:n —11 J0:n— 1 ) ■— ^Tq — 


0 '-'I[io=io]b:i 


(37) 


and 


ni„y„(io:n-l, jo:n-l) := HLo 0*5+1*’, 


9= 

TTOO /■ • \_TT^~1 ^^9+1^9 


(38) 


By Lemma 1^ we have 

(Mto)^E 


Mm 




— ^ ^ ,jn (^0:n—1 : JO:n — (^0:n—1 : JO:n — l) 


(39) 




where and are obtained by partitioning the summation set: 


^ ^ (zo:n—1) JO:n—(^ 0 :n —17 JO:n—l); 

^(*Ti)in)>2n^ 

Mm 

Bm-.= J2 E 


1 — 1 jn- {io-.n-ljo-.n-l 

A(*ti 


^ ^ (^0:n—1) JO:n—1 (^0:n—1 ? JO:n—1) ■ 

(40) 


Note that although not explicitly shown in the notation, Ili^j^{iQ:n-i, jo-.n-i) 
depends also on m through the size of matrix a, whilst Ei^j^(iQ:n-i,jo-.n-i) 
does not. 
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We shall prove that Am = 0 and that for all m large enough Bm is equal 


to the r.h.s. of (34). First consider Am- We can use the implication (33) 


given in the proof of Lemma and observe that if A{in,jn) > and 

nz„ j„(* 0 :n-i, Join-i) > Oj then by two applications of the triangle inequality 

jp) — jn) jn) ^ ^ij'pi'^n) ^{jpj jn') ^ d; 

and hence I[ip = jp] = 0, for all 0 < p < n. Consequently, by using the fact that 
TT^^gf^ . . . gf ^ ^ ^ 

we have 


A.rr-, — 


^ ■••gf = 0. ( 41 ) 


^(^n,in)>2n^ 

Next we consider Bm- Let us start by writing 


^in,3n ■— ^ ( Ilinjjn (*0:n —1) J0:n—1 —1) J0:n—l) ) 

(iO;n-l J0;n-l) 


(42) 


and 


0(ai,..., Up) := (^(ai+kM}modN,..., (ap + kM)modN'), V(ai,..., Up) G V, 
for some fixed k G 2. and any p > 0. First we prove that B^ j^ satisfies 

(43) 


By (2.2) we have immediately, for all io-.mjo-.n G 

44^71,in (^0:n—l; J0:n—l) — (0(^O:n—1; J0:n—1)) ; (44) 

and also 


— jo:n—l') — ,in ) (^(^0:n—1 5 i0:n—1)) • 


(45) 


Combining (H^, ([44|, (45) and using the fact that (j) : [Mm]" x [MmY 
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[Mm]" X [Mm]" is a bijection to perform a change of variable, we can write 


^in,3n — ^ ^ n(/)(i„,j„) (?^(*0:n—1) J0:n—(^(*0:ra—1) J0:n —l)) 

(iO;n-l JO;n-l) 

^ ^ (zQ:y 2 _l, J0:n—1 (^0:n—1) J0:n—l) 

(iO;n-l JO;n-l) 


= B 




establishing (43). 


Since for any u,v £ [A^j, and 0 < c < N/2, A{u,v) = c if and only if 


u = {v±c)modN, we can re-parametrise the summations in (40), and by using 


(43) we have 


m—1 M 

^"^ = EE E ^i+kM,{i+kM+c) mod N 

k=0 |c|<2n/3 

M 

“™E E ^Mo+^,(«o+f+c)mod JV 

e=l |c|<2ra/3 

for any 0 < ug < (m — 1)M. 

Recall (33) from the proof of Lemma An analogous implication 

n—1 

> 0 \ip — in\ < {n — p)/3 < n/3 V 0 < p < n, (47) 

9=0 

can be established for Uoo by using the absolute difference instead of the metric 

A. 

Let us set ug = 2>nj3 and assume that m > {ug + M + 3n/3)/M, which is 
legitimate since we our aim is to hnd the limit of Bm as m —)■ oo. We then have 


(ug + £ + c) mod JV = Ug + £ + c, y £ G [M], |c| < 2nj3, (48) 


and by using (33) and (47) one can check that when = ug + £ and j„ = 


Ug+£ + c, then ni„y„ (^om-i, jom-i) and Il^j^{ig:n-i, jg.n-i) are greater than 


zero only if /3 < iq+i, jq+i < Mm — (3, for all 0 < g < n. But by (2.4) 


* 5 + 1 *, _ Q,^+i « for all /3 < iq+i < Mm — /3 and iq € [Mm]. Thus we have by 
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(42) and (48) 


_ 

uo-\-£,{uo-\-£-\-c) mod N 

— ^ ^ J0:n—l)^uo+£,iio+^+c(^0:n —15 jo:n—l) (^9) 


Finally we use the fact that by (2.2) and (2.4) = q/^ for all z, j, k G 


Z and hence by ([^, ([^ and the fact that ^UQ-\-£,uo-\-i+c{io:n-iJo-.n-i) = 

^l,£-\-c(j'0:n— 15 JO:n—1) ; 


Bjn 

m 


M-l 




^—0 |c|<2n^ zo;n — 

jQ,r^-l^r 


n^,^+c(^0:n—1; 


Jo:n—1 )'^^,^+c(^0:n— 1 j j/o:n—1) 


M-l 

^=0 |c|<2n/3 



(50) 


where the last form is independent of m. The claim then follows by combining 


(35), (39), (41) and (50). 


□ 


4. Time-uniform convergence 


Recall from Propositionthat for Algorithm]^ if Assumption |(2.1) holds, 
then for each n S N and p > 1, 


sup -s/Mto E[|7r()^’"((p) — < 00 . (51) 

M,m>l 

In this section we establish conditions under which the LEPF and IBPF satisfy, 
for all p > 1: 


sup sup-s/M E[|7r))^’”((p) — 7r„((p)|P]^/^ < oo, 

M>1 ra>0 


(52) 


and do not satisfy, for any p > 1: 


sup sup v^E[|7r))^™((p) - 7r„(<p)|P]^/P < oo, (53) 

m>l n>0 


where in (52), m is fixed and in (53), M is fixed. We note that (52) and (53) 
are equivalent to corresponding inequalities with sup ^>2 and sup,„>]^ replaced 
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by limand limsup^_^go respectively, since for tp € ^(X), |7r^’”((p) — 

7>‘n(‘/^)| < 0SC((^) < OO. 

We shall again leverage the fact that the LEPF and IBPF are instances of 
Algorithm which is itself an instance of aSMC from mi, where it was shown 


that 


cN _ 

•- 




Nf{M,m) :=Mm£^^, 


play a central role in time-uniform convergence. The quantity is commonly 
called the effective sample size. Note that by Jensen’s inequality we always have 
^ equivalently N^{M,m) < Mm. We shall appeal to the following 
result, which is a special case of [H Proposition 3] (in particular see the last 
displayed equation in mi Proof of Theorem 2]). 


Proposition 4. Suppose that Assumption (2.1) holds and additionally, 


3((5, e) s [1, oo)^ s.t. sup sup 


9n{x) 


>0 X,y Pniy) 


<S, and f{x,-) < ef{y,-), yx,y gX. 

(54) 


Then there exists p < 1 and for each p > 1 a finite constant Cp such that for 
any n>t), M>\,m>l and p G S§{X), Algorithm^ has the property: 

r,! 1/P 


VMm ^^0 L 


Mm\—p/2 


(55) 


4 . 1 . The regime m fixed and M —?> 00 

We shall now show that under the assumptions of Proposition both the 


LFPF and IBPF satisfy (521. For the LFPF, this is a new result. For the IBPF, 
the result is not very surprising, since it is well known that under the strong 


but standard hypothesis (541, a single bootstrap particle filter is time-uniformly 
convergent (see [5J Section 7.4.3.] and references therein). However, perhaps 
more surprising is the simplicity of the following argument, which applies to 
both the IBPF and LEPF. 

It was noted in Section equation ([^ that for both the LEPF and IBPF, 
for any k G [m], 

Wf=W(,=-.w!:, W,jeGfc. (56) 
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Consequently, for any M, m> 1 and n > 0, 


?Mm 


wEZiMW:;)" 

1 

m J. (W^V 

m ^k—1 n J 

1 A, 

^ \ EZimr , 


> 


m 


(57) 


or alternatively IV®® > M. Substituting the lower bound (57) into (55) gives 


(52) as claimed. 


4-2. The regime M fixed and m —)■ oo 

The following proposition establishes that lim sup 


n—¥oo ^ n 


(T^ = OO is a sufficient 


condition for failure of (53). In Sections 5.1 and 5.2 we present examples for 


the IBPF and LEPF such that lim„_>oo = oo and (54) holds. 


Proposition 5. Consider Algorithm^ Assume that the hypotheses of Theorem 
hold, fix ip £ ^(X) and M > 1. Then for any n > 0 and p > 1, 

lim (58) 

m->.oo yjM \ ) 


//limsup„_,, 3 ^ cr^ = 00 , then (53) does not hold for any p>l. If additionally 


(54) holds, then for the LEPF and IBPF, for any p > 1 


lim sup sup E 

m—^oo n>0 


\ 


X 

kG[m] 


wh 


X 


j G [m] 




(59) 


Remark 4. The condition in (|59|) clearly rules out: 

wl: 


sup sup E 

m>l n>0 


max 


Wi 


< oo, 


feG[m] 

^jGlmJ 

which is exactly the key hypothesis of [5] as written in ([^ in the case e = 0. 
Note however, that whilst Proposition establishes that ( [5^ does not hold, i.e. 
time-uniform convergence at rate does not occur, we have not ruled out 


30 




































the possibility that time-uniform convergence occurs at some slower rate. More¬ 
over, our negative result is of course valid only for the specific local exchange 
mechanism appearing in Algorithm which is only a special case of the more 
general framework of [S] . In Section |6.3| we shall comment on some possible 
algorithmic modifications to ensure time-uniform convergence. 


Proof. To prove (581, we follow arguments used in the proof of [13 Theorem 12], 
who established a limit of the same form for a standard particle filter. We first 
recall the fact that for a sequence of random variables (Am)m>i, if A A 
for some A, and for some p > 0, (|Am|^)m>i is uniformly integrable, then 
lim„i_>oo E[|Am|^] = E[|A|^’], see [13 P-14, Theorem A]. As in the statement, fix 
(p G ^(K), M > 1 and n > 0. Then set A^ = By Theorem]^ Am 

converges in distribution to a zero-mean Gaussian variable with variance affM. 


For any given p > 1 and <5 > 0, (511 implies sup^>]^ E[|Am|P+^] < oo, so by 
Lemma II.6.3], {\Amf‘)m>i is uniformly integrable. Therefore ([^ holds. 


If (53) were to hold, the r.h.s. of (58) would be upper-bounded by a fi¬ 
nite constant possibly depending on p and M, but independent of n. The 
latter would contradict limsup„_^oc = oo. Hence (53) does not hold when 
limsup„_,^ cr^ = oo. 


Now assume (54) holds in addition to lim sup„_^oo = oo. In order to 


establish (59) by a contradiction, assume that for some p > 1 there is a constant 


dp such that 


lim sup sup E 

m—>-oo n >0 


E 


Wf 


= dp < oo. 


fcG[m] VSjG[m] 

Since for the IBPF and LEPF, Wf = Wi = for all i,j G Gfc, we have 

— 1, ' 2 


(60) 





= l/^n 


Mm 


Combining this and (60) into the bound (55) of Proposition gives 

limsupsupVwE[|7r))^’"((p)-7r„(p)|P]i/P < II'P||oo^|=t^ < oo, 

m—>-oo n >0 V At -L P 


31 




















in turn implying (53), since \7Tn^{^) ~ < osc((p) < oo. But we have 


already proved that (531 does not hold for any p > 1 when lim sup, 


n—>-oo ^ n 


= oo, 


hence the inequality in (601 does not hold for any p > 1. This completes the 
proof. □ 


5. A closer look at the asymptotic variance 

Our objective in this section is to develop more insight into the asymptotic 
variance in Theorem [l] 


(61) 


0<n<M 

\v\<2n0 


for the LEPF and IBPF, especially regarding its behaviour as n —)■ oo. 


For the convenience of the reader we recall that in (61), denotes expec¬ 
tation w.r.t. to the law of the bi-variate Markov chain: 


Jn) ^ ^ 

P(/fc = ik,Jk =jk\h+i =ik+i,Jk+i = jk+i) = 


(62) 


and thus the only distinction between the asymptotic variances for the LFPF 
and IBPF is through Ofoo, as given in ^ and 

To help develop insight, we consider a much simplified HMM: 


f{x,-) = 7ro{-) and 5„=5G^(X), V n G N. 


This is obviously quite unrealistic, so let us be clear about our motives: 


(63) 


Firstly, (63) can be understood as being a favourable assumption for the 


performance of the LFPF and IBPF: f{x, ■) = 7ro( •) implies that 7r„ = ttq and 
that the particles {C,l^ : i G [N]} in both Algorithms and are i.i.d. samples 
from ttq for all n G N. Never-the-less, we shall see in Section [5)^ in conjunction 
with Section [4.2| that under this favourable assumption certain negative results 


can hold for the IBPF and LEPF, namely lim„_,,oo = oo and lack of time- 
uniform convergence. 
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Secondly, we shall see that (631 makes the expression in (61) considerably 


more tractable, allowing us to make precise comparisons between the LEPF and 
IBPF. We shall see in Section [6?^ that our conclusions for this simplified HMM 
are consistent with results obtained by simulation for a more realistic stochastic 
volatility model. 

Under ( [6^ , we have 7r„ = ttq, 7n(l) = 7J‘oQo,n(l) = and for all 

$ G and 1 < p < n. 


SO 


= I[4 = Jn]7ro(p") n (l += Jp] - 


P—0 
2\/i I 


= I[J„ = J„]7ro(p^)(l + 


where 


Zn — II[/p — Jp 

p—0 


and c = TTQ^g^)/iT{){gY — 1. By (61) and (64|, we thus have 

M-l 

M 


TTote^) ~ M ^ “ M ^ 

^ 0<u<M u^O 




(64) 


(65) 


( 66 ) 


^U.U I ^ 

. L - ' ' ■ J M ^^ 

\v\<2n0 

where t = log(l + c) and the second equality follows from the initial condition 


part of (|62 ). 

We thus observe the key role in the asymptotic variance played by the mo¬ 
ment generating function of the random variable X„, whose interpretation is 


clear by (65): is the number of times the Markov chains / and J collide in 

n steps. Intuitively, the more frequent these collisions tend to be, the faster the 
growth of the asymptotic variance. 

To help formalize this intuition, our next step is to characterise the law of 


Zn under (62) with u = v, for the IBPF and the LFPF, in order to understand 
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how cr^ behaves as n ^ oo. We stress that this law is a consequence only of 


(62) and does not depend on (63). 


5.1. Law of Zn for the IBPF 

In the case of the IBPF we see immediately by inspecting Ooo in ([^ (see 
also Figure [^) that when u = v for any m S Z in (62), / and J are sequences of 

1.1. d. random variables, each uniformly distributed on the set {[(u — 1) /M\ M + 
I,..., [(m — f)/M\ M + M}. Hence the random variables — Jk])o<k<n 
constitute a sequence of Bernoulli variables with success probability M~^ and 
consequently 

Zn ~ Binomial(n, 1/M), (67) 

whatever the value of u (we note that this conclusion can also be deduced from 
na Lemma 3.2], which provides a non-asymptotic variance formula for a single 


bootstrap particle filter, i.e. N = M). Hence (66) can be further simplified to 


2 1 ^-1 
= _ y" ]E„, 

., 2 \ 


[e‘^"]=Eo,o[e‘^"], 


( 68 ) 


where t = log(l + c), c = tto{9 )/T^oig) “ 1- 

By (67), Eo,o[e*'^"] is the moment generating function of a binomial distri¬ 


bution, so readily. 


) 




(69) 


Thus when (63) holds, and assuming that irofip'^) > 0 and c > 0, for the IBPF 
tr^ grows exponentially fast as n —>■ oo. This can be considered a negative result 
for the IBPF compared to the standard bootstrap particle filter, for which it 
has been shown that under a variety of more realistic conditions the sequence 
((T^)ngN may be bounded by a finite constant, or is tight when the observation 


sequence is treated as random pnnuminis]. When ( p3| holds one can easily 
construct ttq and g such that (|54|) holds and c > 0. 


5.2. Example o/cr^ —>• oo for the LEPF 

Let us point out an example which satisfies (63), ( |54[ ) and for which cr; 

Notice that for the LEPF, it follows easily from (62) and ([^ that for any u gZ 


oo. 
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and whatever the values of M and 0, 


E«,u[II[-^n = n]] = Eu,u 
hence we have the crude lower bound, 


II[/p — Jp 

.P—0 


1 




/ T^oig'^) 1 

\no{gy M 


As we shall now demonstrate, one can readily construct examples for which 
^o(g)^ S ^ ^ hence such that —?► oo exponentially fast for any ip with 
7ro(^^) > 0. Let X = {0,1}, p G (0,1/M), <5 G (0,1) and 


7’‘o(0)=P, 7ro(l) = l-p, 5(0) = 1-(5, 5(1)= A 


Then, since 


7ro(5^) 

Mg)'^ 


p{l - (5)^ + (1 -p)5'^ 

(p(l — (5) + (1 — p)(5)2 5->-o 


> l/p > M, 


we can choose S small enough that > 1, whilst satisfying g G 

and g{x) > 0, as required for Assumption and (54). 


5.3. Law of Zn for the LEPF 

The interaction pattern illustrated in Figure makes study of the law 
of Zn more difficult for the LEPF than for the IBPF, but never-the-less we 
shall below derive an exact characterisation of the distribution of X„. Ob¬ 
serve that Zn depends on I and J only through the sequence of indicator 
variables (I[/fe = Jk])o<k<n, but this sequence is unfortunately non-Markov 
and difficult to analyse directly. However the bi-variate process {D,E), with 
D := iDk)o<k<n, E := {Ek)o<k<n and 


I 

I 


^—I 

I 

I 

M 


M 


Ek 


l[In-k = Jn-k], V 0 < fc < n. 


is easier to deal with. 

It follows from Uoa in (H), ([^ and some elementary manipulations (omitted 
for brevity) that the bi-variate sequence {Dk, Ek)o<k<n is Markov and for any 
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u G Z the initial condition (J„, Jn) ^ 5u implies {Dq, Eq) ~ do ® <5i. Thus 
all statements about the law of functionals of {D, E) in the remainder of Section 


5.3 hold irrespective of the particular value of u = u in (62). 


By similarly elementary but lengthy manipulations it can be checked that 
D is also Markov, with for all 1 < fc < n, d S Z, 

D,\{D,., = 4 ~ (70) 

and if 4_i,4 G ^ and x £ {0,1} such that 4_i =4 = 0, i.e. that the 
integer parts of {In-k — ^)/M and (J„_fc — 1)/M, as well as {In-k+i — ^)/M 
and {Jn-k+i — ^)/M, coincide, then 

M 


Ek I {Dk-i = 4-1, Dk = dk, Ek-i =x} ^ Bernoulli q 2 

and otherwise 


(71) 


Ek I {Ek — \ — dk — 1^ Dk — dk'i Ek — \ — x} 


By (71) and (72), for all b G [n], 


Zn\ {B = b} ^ Binomial! b, 


M 


V ’ (M- 42 + 02 


where 


B ■.= Y,^Dk-i=0]I[Dk=0]. 




(72) 


(73) 


(74) 


Therefore it remains to derive the distribution of B, the distribution of Zn is 
then available by marginalisation. 

We will write Beta-Binomial(n, a, b) for the so-called beta binomial distribu¬ 
tion [5T] specified for any a, 6 > 0 by the probability mass function 




^ ;v ^ IV, 


(75) 


where B(a, b) denotes the beta-function. The case Beta-Binomial(n, 1, 0) is un¬ 
derstood as the point mass 4. Moreover, we write RWZ(n) for the distribution 
specified by the probability mass function: 


p{x) = 


22L«/2J 


2 [n/2j — x 
[n/2\ 


V 0 < a: < [n/2j , 


(76) 
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with convention (”) = 1 for all n > 0. As shown in Theorem 2], (76) is 
the distribution of the number of times a symmetric simple random walk on 
Z starting from zero returns to zero in n time-steps. The following result, in 


conjunction with (73), characterises the distribution of The proof is in the 
Appendix. 


Lemma 5. Fix M >1 and 6 € {1,... ,M— 1}, let B be as defined in (74) and 
let B, S and V be random variables such that 


B I {V = V, S = s} ^ Beta-Binomial(u, s -I- 1, n — — s), (77) 

and 

5'|{t/ = u}~RWZ(n-'i;), V ~ Binomial ■ (78) 

Then B has the same distribution as B. 


6. Interpretation of results and discussion 

One of the main conclusions which can be drawn from our results thus far 
is quite negative: we have seen in Section that for the IBPF and LEPF, the 
asymptotic variance can increase over time at an exponential rate. However, 
taken in isolation, this fact does not convey information about the relative 
performance of the two algorithms. The aim of Section is to address this 
matter, qualitatively and numerically. 

In Section |6.1[ we continue with a toy model for which we are able to nu¬ 
merically evaluate asymptotic variances without simulation and explain the be¬ 
haviour we see in terms of the collision count We also examine depen¬ 
dence on the parameters M and 9, compare asymptotic variance values with 
nonasymptotic values obtained by simulation, and explore the behaviour of the 
effective sample size. Section |6.2| considers a more realistic stochastic volatil¬ 
ity model, and Section [673] provides some concluding perspectives and describes 
avenues for future investigation. 
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6.1. Evaluation of asymptotic variances 


Recall that for the toy model of Section]^ the asymptotic variances for the 
IBPF and LEPF are proportional to Eo,o[e*'^’*], where counts collisions of 
the Markov chains with transition probabilities given by aoo • Due to the graph 
in Figur^^ having only one connected component, versus several in Figure [^, 
it seems natural to suppose that is “typically” lower for the LEPF than for 
the IBPF, and thus the LEPF will exhibit lower asymptotic variance. 

To explore this idea, we now use (73) and Lemma to make numerical 
evaluations of Eo,o[e*'^"]. We do so for the specific instance of the model (631 
where 


Xq ^ A/’(0,1) and g{x) = e (79) 

and define to := log(7ro(5^)/7ro(5)^) « .1855077. 

Figure]^ shows Eo,o[e*'^"] vs. n for the LEPF. Noting the logarithmic scale, 
the plot suggests that Eo,o[e*^"] grows without bound as n —)■ oo. In Figure 
[^, Rn denotes the ratio of Eo_o[e*'^"] for the IBPF to that for the LEPF. It is 
apparent that i?„ is growing exponentially fast with n, suggesting the interaction 
structure of the LEPF has significant benefits in terms of asymptotic variance. 

Figure compares i?„ to the ratio of non-asymptotic mean square errors 
estimated by: 




(80) 


where 7r,^lBPF(7’) and 7r^LEPF(‘/5)> * = Xmc = 2000 are indepen¬ 

dent approximations of 7r„((/3), with (p = Id, obtained from the IBPF and LEPF. 
It is apparent that as N grows, approaches and that the benefit of the 
LEPF over the IBPF becomes more substantial. 

The main algorithmic difference between the LEPF and the IBPF is the 
number of particles exchanged between groups. For the IBPF, this number is 0, 
for the LEPF, is specified by the parameter 6 . Figure shows the behaviour 
of Rn for different values of 9. The results suggest that highest value of is 
obtained when 9 = M/2, i.e. half of the particles in each group are exchanged. 
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lEo,o[e*'^"] vs. n for the LEPF 



(a) 

Rn and vs. n 



Rn vs. n 



Figure 3: (a) Eo,o[e*'^"] vs. n for the LFPF with 9 = 1. (b) i?„ vs. n loi 6 = 1. 
(c) Rn and R^ vs. n for 0 = 1, M = 20 and t = to. (d) Rn vs. 0 for M = 20 


and t = to- 
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P(Z„ = z) 


Eo,o[e*'^"] vs. n, M{n) = vP 



z n 

(a) (b) 


Figure 4: (a) Probability mass functions of for IBPF and LEPF with M = 3, 
9 = 1. (b) Eo^o[e*^"] vs. n for M{n) = rP. 


By ( [68| , the behaviour of is explained entirely by the distribution of Z„. 
Figure]^ shows a comparison of these distributions in the case that M = 3 and 
6 = 1, i.e. the same settings as in Figurej^ By (67l the distribution of for 
the IBPF is centred at n/M, while the corresponding distribution in the case 
of LEPF remains concentrated near 0 and, in particular, we observe that the 
distributions become increasingly distinct for large n. 

To help illustrate the connection to the convergence results of Section |4.2[ 
Figure]^ shows a simulation of over 70000 time steps for the LEPF using 
the same model as before with M = 20 and m = 50, 250,500. For each fixed 
value of TO, does not crash to zero and stick there, but rather it fluctuates 

to some extent, eventually as n grows reaching values which are closer to 0 for 
larger values of to. It is relevant here to recall the possibly quite loose lower 


bound > 1 /to derived in (57). Informally, some connection between this 


phenomenon and the convergence rate can be observed in equation (55), where 
(£;Mm)-i /2 appears in the Lp error bound. 

Lastly we consider the question of how M = M (n) should be scaled with n 
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vs. n 

- m-50 

- m=250 



Figure 5: (top) and its running minimum on log-scale (bottom) for M = 20 

and m = 50, 250, 500. 


in order to prevent explosion of cr„ as n —>■ oo. So let tT„ be as given in (611 


but with M replaced by M(n). For the IBPF we see straightforwardly that if 
limsup„_,,oo n/M(n) < oo, then by (69), 


lim sup —= lim sup (1 -I- — „ ^. , ) 
n— 7^Q\ip j T3. — \ n M [n)/ 


M{n). 


< oo. 


We address the same issue for the LEPF through numerical evaluations again 
using the formulae of Sectionjs^ Figure]^ shows the behaviour of Eo,o[e*^"] = 
) for the LEPF with M(n) = nP, p = 0.75,0.90,1.00,1.11,1.33. The 
results suggest that the “right” scaling may be M{n) = n, as for IBPF, in the 
sense that for p > 1, cr^ tends towards ttq{Tp'^), and for p < 1, limsup„(T^ = 


oo. We also note that for M{n) = n, we have from (79) for the IBPF that 
lim„_>oo Eop[e‘'^"] = e° « 1.23, where as for the LEPF in Figure]^, with 
M{n) = n, it appears that limsup„_,,go Eo,o[e‘^"] « 1.12. 
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vs. n 


Approx, error variance vs. n 
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(a) 


(b) 


Figure 6: (a) Approximate relative variance, (b) Approximate error variance 
for N = 1000. 


6.2. Simulations 

We now see if some of the phenomena observed for the simplified model carry 
over to the case of a more realistic stochastic volatility model: 

Ao~V(0,l), Xu+i=aXk + Vk, Ffe~V(0,4), V fc > 0, 

(81) 

Ifc = 6exp(Afc/2)efc, £^-^(0,1), V fc > 0. 


For the parameter values in the model we took a = 0.9, & = 0.1, try = 0.5, and 
simulated a sequence of observations from the model. For the parameters of 
IBPF and LEPF we took M = 20 and 6 = 1. 

Figure]^ shows the ratio (80) for A^mc = 10000, (p{x) = x. The true 
value of 7r„((p) was estimated with standard BPF using 10® particles. Roughly 
similar behaviour to that in Figure P can be observed, although of course 
for the stochastic volatility model we are not able to evaluate i?„. Figure]^ 
shows estimated mean square errors for IBPF and LEPF, proportional to the 


numerator and denominator in (801, respectively. 

Figure shows against n with M = 20 and to = 50 for a single run 
of each algorithm over 2 x 10^ time steps, for the stochastic volatility model 
(81). For both the LEPF and the IBPF, never goes below 1 /to = 0.02, 

in accordance with (57), but it is notable that for the IBPF, stays quite 
close to 1 /to = 0.02, where as for LEPF, fluctuates around higher values. 
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vs. n 



Figure 7: vs. n with M = 20 and m = 50. 


6.3. Concluding remarks 

Although our results establish that the asymptotic variance for the LEPF 
can grow over time without bound, so that time-uniform convergence at rate 
does not hold, our numerical experiments indicate that the errors from 
the LEPF may be substantially smaller than those from the IBPF, and that 
this difference can become more substantial as the time-horizon grows. The 
interaction structure of the LEPF therefore has clear benefits. The question 
of how to maximize these benefits, by considering variants of the LEPF arising 
from different a matrices, seems challenging. Outside of the toy model scenario. 


the formula for the asymptotic variance (61) is rather complicated. However, it 
can be written in terms of a composition of a sequence of non-negative integral 
operators. If the observations (?/„) are treated as a random and stationary 
sequence, then the sequence of integral operators becomes also random and 
stationary. In light of this, Oseledec’s theorem or similar results for non-negative 
integral operators may provide some tools to describe the rate of growth of the 
asymptotic variance over time. 

More extreme modifications to the LEPF and IBPF may allow time-uniform 
convergence at rate to be achieved. For instance, choosing a adaptively 

in a time-varying manner so as to control the effective sample size can provably 
help to control errors El- The price to pay is that doing so may compromise the 
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communication efficiency of the algorithm on a distributed computing architec¬ 
ture. Another possible approach is to stabilize the performance of the algorithm 
by artificially regulating the values taken by the weights and thus introduce 
some bias, but avoid degeneracy and prevent low values of effective sample size. 
A drawback of this approach is that it would compromise the lack-of-bias prop¬ 
erties which validate the use of particle filters within particle MCMC. Rigorous 
treatment of these ideas is a potential topic for future research. 
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Appendix A. Auxiliary proofs 

Lemma 6. The matrices defined in ([^ satisfy Assumption (2.4) 


Proof. For i,j £ Z such that \i — j\ > fi := M — 1 + 6 , hy (§ , = 0 and 

clearly holds. For \i — j\ < fi, we observe that by ([^ 



(A.l) 


and provided that i + kM,j + kM G [A], then by (2.2) we also have 


i mod AT, 7 mod AC 


_ (z+fcM) mod N,{j-\-kM) mod N 

— 


OtN 


(A.2) 
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So, to complete the verification of Q we shall, for each i,j e Z such that 
< /3j find k such that i + kM,j + kM € [A^] and check that _ 

^i+kMJ+kM 

^oo 

First consider the case j > i, and set A: = — [(i — 1)/MJ. In this case, by 
using ^ together with the assumptions that Mm > 2/3 + 1, and j — i < P we 
have 


^^kM,j+kM ^ + kM-9- 1))/MJ = 0] 

= M-H[j + kM - e e [M]] 

= + kM-0)^nodN-1)/M\ = 0] 

_ i-\-kMJ-\-kM 

— 5 


and moreover i + kM,j + kM € [N] holds and thus by (A.l I, (A.2) we have Q 
for j > i. 

For the case i > j, we can take k = — [(i — 1)/M\ + m — 1, for which 
i + kM, j + kM G [A^], and similarly as above 

^^kM,j+kM ^ - 1)M <j + kM-9-l< mM - 1] = a^^Mj+kM ^ 


from which we conclude, by (A.l I, (A.21 that (2.4) holds for all i,j S Z. □ 


Proof of Lemma Let V be distributed as in (781, thus by (70) V has the same 
distribution as the number of zero increments in D. Our strategy is to con¬ 
struct a collection of sequences {73P}o<p<y and random variables {P^}o<p<v 
such that and have the same distributions as D and B, respectively. 
The construction is done in a manner that allows us to identify explicitly the 
distribution of B^ and hence the distribution of B. 

To start, take a sequence := {D^)o<k<n-v where Dq = 0 and the incre¬ 
ments (77° — L>^_ .i)i<k<n-v are i.i.d. with common distribution (5_i/2 + Si/2. 
We then define sequences L>^ := (L)j^)o<k<n-v+p for 1 < p < F recursively 


:= (nr ,. •., , ^rv+p-i). 


(A.3) 


where Kp is a uniform random variable on the set {0 ,... ,n — V +p— 1}, for all 
1 <P<V. 
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By this construction, is of length n — V + 1 and has only non-zero 
increments. is of length n — V + 2 and has exactly one zero increment at 
a uniformly random location. Finally, D'^ is of length n + 1 and has exactly 
V zero increments at uniformly random locations and hence can be checked to 
have the same distribution as D. 

The random variables {BP}q<p<v are defined as 


n—y+p 


BP:= Y. I[Dl_^ = 0]l[Dl = 0], V0<p<y, 


(A.4) 




for which we have, by (A.3), the recursive expression 


BP = BP-^ + I[DpY = 0], V 0 < p < y. 


(A.5) 


By the definition of Kp, = 0] in (A.5) is a Bernoulli random variable 

with success probability 

= 0] 1+ BP-^ + noizi ^ onoY = o] 


n — V + p 


n — V + p 


(A.6) 

and if we define S := J2k=i'^iBk — 0] when V < n, and S' := 0 when V = n. 
This means that S is the number of times zero occurs in the sequence , 
excluding the first element, then by induction, one can check that for all 0 < 
P<V, 

n — V+p 

s= Y = 

k=l 


and hence by (A.6) 


= 0] ~ Bernoulli 


1 + S + BP- 


(A.7) 


n — V+p 

The key observation is that by (A.5) and (A.7), the sequence = 0])o<p<y, 

is distributed according to a Polya’s urn model, for which we have readily (see, 

e.g. [13]) 

y 

= 0] ~ Beta-Binomial(y, S + l,n — V — S), (A.8) 

p=i 
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with the convention that Beta-Binomial(fc, 1, 0) corresponds to point mass 8 k 
for any fc > 0. 

To conclude the proof we observe that because the increments of are 


non-zero, we have = 0 and hence, by (A.5|, = X]p=i = 0] and 

therefore, by (A.8), if we set B = B^, it remains to point out that because 
is a simple random walk, we know by [221 Theorem 2], that S is distributed 


as described in (781. Finally, since has the same distribution as D, then 


by (74) and (A.4), must have the same distribution as B, concluding the 
proof. □ 
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